Discovering Frequent Substructures in Large Unordered Trees
نویسندگان
چکیده
In this paper, we study a frequent substructure discovery problem in semi-structured data. We present an efficient algorithm Unot that computes all frequent labeled unordered trees appearing in a large collection of data trees with frequency above a user-specified threshold. The keys of the algorithm are efficient enumeration of all unordered trees in canonical form and incremental computation of their occurrences. We then show that Unot discovers each frequent pattern T in O(kbm) per pattern, where k is the size of T , b is the branching factor of the data trees, and m is the total number of occurrences of T in the data trees.
منابع مشابه
Efficient Tree Mining Using Reverse Search
In this paper, we review our data mining algorithms for discovering frequent substructures in a large collection of semi-structured data, where both of the patterns and the data are modeled by labeled trees. These algorithms, namely FREQT for mining frequent ordered trees and UNOT for mining frequent unordered trees, efficiently enumerate all frequent tree patterns without duplicates using reve...
متن کاملEfficient Discovery of Frequent Unordered Trees
Recently, an algorithm called Freqt was introduced which enumerates all frequent induced subtrees in an ordered data tree. We propose a new algorithm for mining unordered frequent induced subtrees. We show that the complexity of enumerating unordered trees is not higher than the complexity of enumerating ordered trees; a strategy for determining the frequency of unordered trees is introduced.
متن کاملEfficient Algorithms for Discovering Frequent and Maximal Substructures from Large Semistructured Data
In this paper, we review recent advances in efficient algorithms for semi-structured data mining , that is, discovery of rules and patterns from structured data such as sets, sequences, trees, and graphs. After introducing basic definitions and problems, We present efficent algorithms for frequent and maximal pattern mining for classes of sets, sequences, and trees. In particular, we explain ge...
متن کاملDiscovering Frequent Embedded Subtree Patterns from Large Databases of Unordered Labeled Trees
Recent years have witnessed a surge of research interest in knowledge discovery from data domains with complex structures, such as trees and graphs. In this paper, we address the problem of mining maximal frequent embedded subtrees which is motivated by such important applications as mining “hot” spots of Web sites from Web usage logs and discovering significant “deep” structures from tree-like...
متن کاملMining Frequent Closed Unordered Trees Through Natural Representations
Many knowledge representation mechanisms consist of linkbased structures; they may be studied formally by means of unordered trees. Here we consider the case where labels on the nodes are nonexistent or unreliable, and propose data mining processes focusing on just the link structure. We propose a representation of ordered trees, describe a combinatorial characterization and some properties, an...
متن کامل